<!DOCTYPE html>
<html lang="en-us">
  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
    
<meta charset="UTF-8">
<title>优化聚合查询 | Elasticsearch: 权威指南 | Elastic</title>
<link rel="home" href="index.html" title="Elasticsearch: 权威指南">
<link rel="up" href="docvalues-and-fielddata.html" title="Doc Values and Fielddata">
<link rel="prev" href="preload-fielddata.html" title="预加载 fielddata">
<link rel="next" href="_closing_thoughts.html" title="总结">
<meta name="DC.type" content="Learn/Docs/Elasticsearch/Definitive Guide">
<meta name="DC.subject" content="Elasticsearch">
<meta name="DC.identifier" content="2.x">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <script src="https://cdn.optimizely.com/js/18132920325.js"></script>
    <link rel="apple-touch-icon" sizes="57x57" href="/apple-icon-57x57.png">
    <link rel="apple-touch-icon" sizes="60x60" href="/apple-icon-60x60.png">
    <link rel="apple-touch-icon" sizes="72x72" href="/apple-icon-72x72.png">
    <link rel="apple-touch-icon" sizes="76x76" href="/apple-icon-76x76.png">
    <link rel="apple-touch-icon" sizes="114x114" href="/apple-icon-114x114.png">
    <link rel="apple-touch-icon" sizes="120x120" href="/apple-icon-120x120.png">
    <link rel="apple-touch-icon" sizes="144x144" href="/apple-icon-144x144.png">
    <link rel="apple-touch-icon" sizes="152x152" href="/apple-icon-152x152.png">
    <link rel="apple-touch-icon" sizes="180x180" href="/apple-icon-180x180.png">
    <link rel="icon" type="image/png" href="/favicon-32x32.png" sizes="32x32">
    <link rel="icon" type="image/png" href="/android-chrome-192x192.png" sizes="192x192">
    <link rel="icon" type="image/png" href="/favicon-96x96.png" sizes="96x96">
    <link rel="icon" type="image/png" href="/favicon-16x16.png" sizes="16x16">
    <link rel="manifest" href="/manifest.json">
    <meta name="apple-mobile-web-app-title" content="Elastic">
    <meta name="application-name" content="Elastic">
    <meta name="msapplication-TileColor" content="#ffffff">
    <meta name="msapplication-TileImage" content="/mstile-144x144.png">
    <meta name="theme-color" content="#ffffff">
    <meta name="naver-site-verification" content="936882c1853b701b3cef3721758d80535413dbfd">
    <meta name="yandex-verification" content="d8a47e95d0972434">
    <meta name="localized" content="true">
    <meta name="st:robots" content="follow,index">
    <meta property="og:image" content="https://www.elastic.co/static/images/elastic-logo-200.png">
    <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon">
    <link rel="icon" href="/favicon.ico" type="image/x-icon">
    <link rel="apple-touch-icon-precomposed" sizes="64x64" href="/favicon_64x64_16bit.png">
    <link rel="apple-touch-icon-precomposed" sizes="32x32" href="/favicon_32x32.png">
    <link rel="apple-touch-icon-precomposed" sizes="16x16" href="/favicon_16x16.png">
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
    <link rel="stylesheet" type="text/css" href="/guide/static/styles.css">
  </head>

  <!--© 2015-2021 Elasticsearch B.V. Copying, publishing and/or distributing without written permission is strictly prohibited.-->

  <body>
    <!-- Google Tag Manager -->
    <script>dataLayer = [];</script><noscript><iframe src="//www.googletagmanager.com/ns.html?id=GTM-58RLH5" height="0" width="0" style="display:none;visibility:hidden"></iframe></noscript>
    <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start': new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0], j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src= '//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f); })(window,document,'script','dataLayer','GTM-58RLH5');</script>
    <!-- End Google Tag Manager -->

    <!-- Global site tag (gtag.js) - Google Analytics -->
    <script async src="https://www.googletagmanager.com/gtag/js?id=UA-12395217-16"></script>
    <script>
      window.dataLayer = window.dataLayer || [];
      function gtag(){dataLayer.push(arguments);}
      gtag('js', new Date());
      gtag('config', 'UA-12395217-16');
    </script>

    <!--BEGIN QUALTRICS WEBSITE FEEDBACK SNIPPET-->
    <script type="text/javascript">
      (function(){var g=function(e,h,f,g){
      this.get=function(a){for(var a=a+"=",c=document.cookie.split(";"),b=0,e=c.length;b<e;b++){for(var d=c[b];" "==d.charAt(0);)d=d.substring(1,d.length);if(0==d.indexOf(a))return d.substring(a.length,d.length)}return null};
      this.set=function(a,c){var b="",b=new Date;b.setTime(b.getTime()+6048E5);b="; expires="+b.toGMTString();document.cookie=a+"="+c+b+"; path=/; "};
      this.check=function(){var a=this.get(f);if(a)a=a.split(":");else if(100!=e)"v"==h&&(e=Math.random()>=e/100?0:100),a=[h,e,0],this.set(f,a.join(":"));else return!0;var c=a[1];if(100==c)return!0;switch(a[0]){case "v":return!1;case "r":return c=a[2]%Math.floor(100/c),a[2]++,this.set(f,a.join(":")),!c}return!0};
      this.go=function(){if(this.check()){var a=document.createElement("script");a.type="text/javascript";a.src=g;document.body&&document.body.appendChild(a)}};
      this.start=function(){var a=this;window.addEventListener?window.addEventListener("load",function(){a.go()},!1):window.attachEvent&&window.attachEvent("onload",function(){a.go()})}};
      try{(new g(100,"r","QSI_S_ZN_emkP0oSe9Qrn7kF","https://znemkp0ose9qrn7kf-elastic.siteintercept.qualtrics.com/WRSiteInterceptEngine/?Q_ZID=ZN_emkP0oSe9Qrn7kF")).start()}catch(i){}})();
    </script><div id="ZN_emkP0oSe9Qrn7kF"><!--DO NOT REMOVE-CONTENTS PLACED HERE--></div>
    <!--END WEBSITE FEEDBACK SNIPPET-->

    <div id="elastic-nav" style="display:none;"></div>
    <script src="https://www.elastic.co/elastic-nav.js"></script>

    <!-- Subnav -->
    <div>
      <div>
        <div class="tertiary-nav d-none d-md-block">
          <div class="container">
            <div class="p-t-b-15 d-flex justify-content-between nav-container">
              <div class="breadcrum-wrapper"><span><a href="/guide/" style="font-size: 14px; font-weight: 600; color: #000;">Docs</a></span></div>
            </div>
          </div>
        </div>
      </div>
    </div>

    <div class="main-container">
      <section id="content">
        <div class="content-wrapper">

          <section id="guide" lang="zh_cn">
            <div class="container">
              <div class="row">
                <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                  <!-- start body -->
                  <div class="page_header">
<b>请注意:</b><br>本书基于 Elasticsearch 2.x 版本，有些内容可能已经过时。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch: 权威指南</a></span>
»
<span class="breadcrumb-link"><a href="aggregations.html">聚合</a></span>
»
<span class="breadcrumb-link"><a href="docvalues-and-fielddata.html">Doc Values and Fielddata</a></span>
»
<span class="breadcrumb-node">优化聚合查询</span>
</div>
<div class="navheader">
<span class="prev">
<a href="preload-fielddata.html">« 预加载 fielddata</a>
</span>
<span class="next">
<a href="_closing_thoughts.html">总结 »</a>
</span>
</div>
<div class="section">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="_preventing_combinatorial_explosions"></a>优化聚合查询<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elasticsearch-cn/elasticsearch-definitive-guide/edit/cn/300_Aggregations/120_breadth_vs_depth.asciidoc">edit</a>
</h2>
</div></div></div>
<p>“elasticsearch 里面桶的叫法和 SQL 里面分组的概念是类似的，一个桶就类似 SQL 里面的一个 group，多级嵌套的 aggregation，
类似 SQL 里面的多字段分组（group by field1,field2, …​..），注意这里仅仅是概念类似，底层的实现原理是不一样的。 －译者注”</p>
<p><code class="literal">terms</code> 桶基于我们的数据动态构建桶；它并不知道到底生成了多少桶。 大多数时候对单个字段的聚合查询还是非常快的，
但是当需要同时聚合多个字段时，就可能会产生大量的分组，最终结果就是占用 es 大量内存，从而导致 OOM 的情况发生。</p>
<p>假设我们现在有一些关于电影的数据集，每条数据里面会有一个数组类型的字段存储表演该电影的所有演员的名字。</p>
<div class="pre_wrapper lang-js">
<pre class="programlisting prettyprint lang-js">{
  "actors" : [
    "Fred Jones",
    "Mary Jane",
    "Elizabeth Worthing"
  ]
}</pre>
</div>
<p>如果我们想要查询出演影片最多的十个演员以及与他们合作最多的演员，使用聚合是非常简单的：</p>
<div class="pre_wrapper lang-js">
<pre class="programlisting prettyprint lang-js">{
  "aggs" : {
    "actors" : {
      "terms" : {
         "field" : "actors",
         "size" :  10
      },
      "aggs" : {
        "costars" : {
          "terms" : {
            "field" : "actors",
            "size" :  5
          }
        }
      }
    }
  }
}</pre>
</div>
<p>这会返回前十位出演最多的演员，以及与他们合作最多的五位演员。这看起来是一个简单的聚合查询，最终只返回 50 条数据！</p>
<p>但是， 这个看上去简单的查询可以轻而易举地消耗大量内存，我们可以通过在内存中构建一个树来查看这个 <code class="literal">terms</code> 聚合。
 <code class="literal">actors</code> 聚合会构建树的第一层，每个演员都有一个桶。然后，内套在第一层的每个节点之下， <code class="literal">costar</code> 聚合会构建第二层，每个联合出演一个桶，请参见 <a class="xref" href="_preventing_combinatorial_explosions.html#depth-first-1" title="Build full depth tree">Figure 42, “Build full depth tree”</a> 所示。这意味着每部影片会生成 n<sup>2</sup> 个桶！</p>
<div id="depth-first-1" class="imageblock">
<div class="content">
<img src="images/300_120_depth_first_1.svg" alt="Build full depth tree">
</div>
<div class="title">Figure 42. Build full depth tree</div>
</div>
<p>用真实点的数据，设想平均每部影片有 10 名演员，每部影片就会生成 10<sup>2</sup> == 100 个桶。如果总共有 20，000 部影片，粗率计算就会生成 2，000，000 个桶。</p>
<p>现在，记住，聚合只是简单的希望得到前十位演员和与他们联合出演者，总共 50 条数据。为了得到最终的结果，我们创建了一个有 2，000，000 桶的树，然后对其排序，取 top10。
图 <a class="xref" href="_preventing_combinatorial_explosions.html#depth-first-2" title="Sort tree">Figure 43, “Sort tree”</a> 和图 <a class="xref" href="_preventing_combinatorial_explosions.html#depth-first-3" title="Prune tree">Figure 44, “Prune tree”</a> 对这个过程进行了阐述。</p>
<div id="depth-first-2" class="imageblock">
<div class="content">
<img src="images/300_120_depth_first_2.svg" alt="Sort tree">
</div>
<div class="title">Figure 43. Sort tree</div>
</div>
<div id="depth-first-3" class="imageblock">
<div class="content">
<img src="images/300_120_depth_first_3.svg" alt="Prune tree">
</div>
<div class="title">Figure 44. Prune tree</div>
</div>
<p>这时我们一定非常抓狂，在 2 万条数据下执行任何聚合查询都是毫无压力的。如果我们有 2 亿文档，想要得到前 100 位演员以及与他们合作最多的 20 位演员，作为查询的最终结果会出现什么情况呢？</p>
<p>可以推测聚合出来的分组数非常大，会使这种策略难以维持。世界上并不存在足够的内存来支持这种不受控制的聚合查询。</p>
<div class="section">
<div class="titlepage"><div><div>
<h3 class="title">
<a id="_深度优先与广度优先depth_first_versus_breadth_first"></a>深度优先与广度优先（Depth-First Versus Breadth-First）<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elasticsearch-cn/elasticsearch-definitive-guide/edit/cn/300_Aggregations/120_breadth_vs_depth.asciidoc">edit</a>
</h3>
</div></div></div>
<p>Elasticsearch 允许我们改变聚合的 <em>集合模式</em> ，就是为了应对这种状况。
我们之前展示的策略叫做 <em>深度优先</em> ，它是默认设置， 先构建完整的树，然后修剪无用节点。 <em>深度优先</em> 的方式对于大多数聚合都能正常工作，但对于如我们演员和联合演员这样例子的情形就不太适用。</p>
<p>为了应对这些特殊的应用场景，我们应该使用另一种集合策略叫做 <em>广度优先</em> 。这种策略的工作方式有些不同，它先执行第一层聚合， <em>再</em> 继续下一层聚合之前会先做修剪。
图 <a class="xref" href="_preventing_combinatorial_explosions.html#breadth-first-1" title="Build first level">Figure 45, “Build first level”</a> 和图 <a class="xref" href="_preventing_combinatorial_explosions.html#breadth-first-3" title="Prune first level">Figure 47, “Prune first level”</a> 对这个过程进行了阐述。</p>
<p>在我们的示例中， <code class="literal">actors</code> 聚合会首先执行，在这个时候，我们的树只有一层，但我们已经知道了前 10 位的演员！这就没有必要保留其他的演员信息，因为它们无论如何都不会出现在前十位中。</p>
<div id="breadth-first-1" class="imageblock">
<div class="content">
<img src="images/300_120_breadth_first_1.svg" alt="Build first level">
</div>
<div class="title">Figure 45. Build first level</div>
</div>
<div id="breadth-first-2" class="imageblock">
<div class="content">
<img src="images/300_120_breadth_first_2.svg" alt="Sort first level">
</div>
<div class="title">Figure 46. Sort first level</div>
</div>
<div id="breadth-first-3" class="imageblock">
<div class="content">
<img src="images/300_120_breadth_first_3.svg" alt="Prune first level">
</div>
<div class="title">Figure 47. Prune first level</div>
</div>
<p>因为我们已经知道了前十名演员，我们可以安全的修剪其他节点。修剪后，下一层是基于 <em>它的</em> 执行模式读入的，重复执行这个过程直到聚合完成，如图 <a class="xref" href="_preventing_combinatorial_explosions.html#breadth-first-4" title="Populate full depth for remaining nodes">Figure 48, “Populate full depth for remaining nodes”</a> 所示。
这种场景下，广度优先可以大幅度节省内存。</p>
<div id="breadth-first-4" class="imageblock">
<div class="content">
<img src="images/300_120_breadth_first_4.svg" alt="Step 4: populate full depth for remaining nodes">
</div>
<div class="title">Figure 48. Populate full depth for remaining nodes</div>
</div>
<p>要使用广度优先，只需简单  的通过参数 <code class="literal">collect</code> 开启：</p>
<div class="pre_wrapper lang-js">
<pre class="programlisting prettyprint lang-js">{
  "aggs" : {
    "actors" : {
      "terms" : {
         "field" :        "actors",
         "size" :         10,
         "collect_mode" : "breadth_first" <a id="CO225-1"></a><i class="conum" data-value="1"></i>
      },
      "aggs" : {
        "costars" : {
          "terms" : {
            "field" : "actors",
            "size" :  5
          }
        }
      }
    }
  }
}</pre>
</div>
<div class="calloutlist">
<table border="0" summary="Callout list">
<tr>
<td align="left" valign="top" width="5%">
<p><a href="#CO225-1"><i class="conum" data-value="1"></i></a></p>
</td>
<td align="left" valign="top">
<p>按聚合来开启 <code class="literal">breadth_first</code> 。</p>
</td>
</tr>
</table>
</div>
<p>广度优先仅仅适用于每个组的聚合数量远远小于当前总组数的情况下，因为广度优先会在内存中缓存裁剪后的仅仅需要缓存的每个组的所有数据，以便于它的子聚合分组查询可以复用上级聚合的数据。</p>
<p>广度优先的内存使用情况与裁剪后的缓存分组数据量是成线性的。对于很多聚合来说，每个桶内的文档数量是相当大的。
想象一种按月分组的直方图，总组数肯定是固定的，因为每年只有12个月，这个时候每个月下的数据量可能非常大。这使广度优先不是一个好的选择，这也是为什么深度优先作为默认策略的原因。</p>
<p>针对上面演员的例子，如果数据量越大，那么默认的使用深度优先的聚合模式生成的总分组数就会非常多，但是预估二级的聚合字段分组后的数据量相比总的分组数会小很多所以这种情况下使用广度优先的模式能大大节省内存，从而通过优化聚合模式来大大提高了在某些特定场景下聚合查询的成功率。</p>
</div>

</div>
<div class="navfooter">
<span class="prev">
<a href="preload-fielddata.html">« 预加载 fielddata</a>
</span>
<span class="next">
<a href="_closing_thoughts.html">总结 »</a>
</span>
</div>
</div>

                  <!-- end body -->
                </div>
                <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                  <div id="rtpcontainer" style="display: block;">
                    <div class="mktg-promo">
                      <h3>Most Popular</h3>
                      <ul class="icons">
                        <li class="icon-elasticsearch-white"><a href="https://www.elastic.co/webinars/getting-started-elasticsearch?baymax=default&amp;elektra=docs&amp;storm=top-video">Get Started with Elasticsearch: Video</a></li>
                        <li class="icon-kibana-white"><a href="https://www.elastic.co/webinars/getting-started-kibana?baymax=default&amp;elektra=docs&amp;storm=top-video">Intro to Kibana: Video</a></li>
                        <li class="icon-logstash-white"><a href="https://www.elastic.co/webinars/introduction-elk-stack?baymax=default&amp;elektra=docs&amp;storm=top-video">ELK for Logs &amp; Metrics: Video</a></li>
                      </ul>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </section>

        </div>


<div id="elastic-footer"></div>
<script src="https://www.elastic.co/elastic-footer.js"></script>
<!-- Footer Section end-->

      </section>
    </div>

<script src="/guide/static/jquery.js"></script>
<script type="text/javascript" src="/guide/static/docs.js"></script>
<script type="text/javascript">
  window.initial_state = {}</script>
  </body>
</html>
